To Yin Yu
Climate change has always been a very important problem to human beings, and as many believes it is driven by greenhouse gases emissions and extensive shifts happening in weather pattern around the world. The aim of this tutorial is to try to understand the relationship between the greenhouse gases emissions and temperature, also analyzing the possible factors that might effect greenhouse gases emissions. The green house gases emissions data we will be using are collected from Our World in Data and data regarding temperature will be collected from Berkeley Earth. Our world in Data is a well-known organization that aim to research on data that can help tackle world's largest problems. For more infomation you can visit https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions. On the other hand, Berkeley Earth aim to supply comprehensive and highly user-accessible data that might help explain climate change problem. For more infomation about them, visit http://berkeleyearth.org/. We will be using data from 2012 to 1990 to perform our analysis.
For this tutorial, pandas, numpy and pycountry library will be used to read and organize the data collected. matplotlib, plotly.express for visualization. For analyzing, SciKit-Learn will be used. This tutorial assumes prior knowledge.
Collect Data is first stage in the data lifecycle. We mainly aimed to gather the data in this stage. To download the green house gases emissions dataset for youself, visit https://github.com/owid/co2-data. For global temperature dataset, you can obtain they here https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data, we will be using the the dataset "GlobalLandTemperaturesByCountry" for this tutorial. All of the above dataset are in CSV format.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model
import plotly.express as px
import pycountry
#green house gases emissions dataset
greenhouse_emissions = pd.read_csv("Green_house_emission_data.csv")
greenhouse_emissions.head(10)
#global average temperature dataset
global_land_temps = pd.read_csv("GlobalLandTemperaturesByCountry.csv")
global_land_temps.head(10)
We have loaded in all the data we need to perform the rest of the tutorial now. However, the data is very unorganized and incomprehensive. Hence, we move on to our next stage of the cycle, which is data processing. During this stage, we are aimed to reorganize the data, try to reform it to as comprehensive as possible. This will help prepare for the stage. We will dropping all the columns we are not anticipate to use, possibly renaming the columns or even adding new columns, in order to make the dataset as tidy and organized as possible.
#clearning emissions data
greenhouse_emissions.drop(greenhouse_emissions[(greenhouse_emissions['year'] > 2012)].index, inplace = True)
greenhouse_emissions.drop(greenhouse_emissions[(greenhouse_emissions['year'] < 1990)].index, inplace = True)
greenhouse_emissions = greenhouse_emissions.filter(['iso_code', 'country', 'year', 'co2', 'methane', 'nitrous_oxide'])
greenhouse_emissions.head(20)
#cleaning temperature data
#reorganizing temperature data
global_land_temps = global_land_temps.rename(columns={'dt': 'Date'})
global_land_temps['Date'] = pd.to_datetime(global_land_temps['Date'] , format='%Y-%m-%d')
global_land_temps.head(15)
#removing unrelated rows(only keeping 1990-2012 data) for temperature data
global_land_temps['Year'] = pd.DatetimeIndex(global_land_temps['Date']).year
global_land_temps.drop(global_land_temps[(global_land_temps['Year'] > 2012)].index, inplace = True)
global_land_temps.drop(global_land_temps[(global_land_temps['Year'] < 1990)].index, inplace = True)
#transform monthlyata into annually data by taking the mean
annual_avg_temp = global_land_temps.groupby(['Year','Country']).mean().reset_index()
exclude = ['Asia', 'Bonaire, Saint Eustatius And Saba']
annual_avg_temp = annual_avg_temp[~annual_avg_temp['Country'].isin(exclude)]
annual_avg_temp.head(15)
As you can see here, our temperature data does not came with an ISO-3 country code, that is needed for merging with emissions data. Hence, we will be using pycountry library here to add a column of ISO-3 country.
#adding country code column
annual_avg_temp['CountryCode'] = annual_avg_temp['Country'].apply(lambda x: pycountry.countries.search_fuzzy(x)[0].alpha_3)
annual_avg_temp.head(10)
#merging temperature data with greenhouse emissions data
Combined_df = annual_avg_temp.merge(greenhouse_emissions, left_on= ['CountryCode', 'Year'], right_on = ['iso_code', 'year'])
#dropping the unnecessary/repeated column
Combined_df = Combined_df.drop(['country', 'year', 'iso_code','AverageTemperatureUncertainty'], axis=1)
Combined_df = Combined_df.replace('NaN', np.nan)
Combined_df = Combined_df.dropna()
#reorganize the column of the combined data
Combined_df = Combined_df[['Year', 'CountryCode', 'Country', 'AverageTemperature', 'co2', 'methane', 'nitrous_oxide']]
Combined_df.head(10)
Now our dataset looks much organized and comprehensive, we can move on to the next stage which is data visualization. In this stage, we will try to turn all the nurmeric values into more explanatory graphs. We will also look for potential trends though the graph we general, this will give us a better understand of our data. The type of graph I am using is choropleth world map, with countries that has a higher green house gases emissions painted with darker colors, those with lower emissions with lighter colors.
We will be using the plotly.express library for the choropleth world maps.
# Amount of carbon dioxide(CO2) emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "co2",
hover_name= "Country",
color_continuous_scale= 'YlOrRd',
animation_frame= "Year")
fig.show()
# Amount of methane emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "methane",
hover_name= "Country",
color_continuous_scale= 'YlOrRd',
animation_frame= "Year")
fig.show()
# Amount of nitrous oxide emissions of each country over time
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "nitrous_oxide",
hover_name= "Country",
color_continuous_scale= 'YlOrRd',
animation_frame= "Year")
fig.show()
After looking at the choropleth maps, we found that there are some countries that has a more notable changes in greenhouse gases emissions over time. We decided to create lineplots and scatterplots to obtain clearer imagine of the potential trends between average temperature and emissions.
The countries including China, United States, Russia, Germany, United Kingdom, Japan, India, Indonesia, Brazil. Lineplots and scatter plots will be obtained with matplotlib library.
# line plot of average annual temperature vs time for China
china = Combined_df[Combined_df['Country'] == 'China']
china.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for China
china.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for United States
us = Combined_df[Combined_df['Country'] == 'United States']
us.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for United States
us.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for Russia
russia = Combined_df[Combined_df['Country'] == 'Russia']
russia.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for Russia
russia.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for Germany
germany = Combined_df[Combined_df['Country'] == 'Germany']
germany.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for Germany
germany.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for United Kingdom
uk = Combined_df[Combined_df['Country'] == 'United Kingdom']
uk.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for United Kingdom
uk.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for Japan
japan = Combined_df[Combined_df['Country'] == 'Japan']
japan.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for Japan
japan.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for India
india = Combined_df[Combined_df['Country'] == 'India']
india.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for India
india.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for Indonesia
indonesia = Combined_df[Combined_df['Country'] == 'Indonesia']
indonesia.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for Indonesia
indonesia.plot(x='Year', y=['co2','methane','nitrous_oxide'])
# line plot of average annual temperature vs time for Brazil
brazil = Combined_df[Combined_df['Country'] == 'Brazil']
brazil.plot(x='Year', y='AverageTemperature')
# line plot of annual green house gases emissions vs time for Brazil
brazil.plot(x='Year', y=['co2','methane','nitrous_oxide'])
After ploting all the line plot of annual emissions and temperature vs times for all the target countries, we would out some interesting trends. No all the countries show notable relationship between emissions and temperatures; However, we are able to observe some plausible trend. For Russia, when its emission drop in the earlier years and rise during the later years, the temperature follows a similar pattern. For Germany, there are no much notable change for both temperature and emissions. In United Kingdom case, both temperature and emissions dropped slightly; And for Japan, both rise slightly. For India, both temperature and emissions increases quite noticeably, same for Brazil. Indonesia shows an interesting case, where its methane emissions were the highest during the earlier years but were dropping over time, and CO2 increases dramatically over time, and overpassed methane emissions. Also, its temperature is rising. China and United States's plots do not show a very clear relationship, which we will analyze further in the next stage.
# scatter plot of average annual temperature vs. carbon dioxide emissions for China
china.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for United States
us.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for Russia
russia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for Germany
germany.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for United Kingdom
uk.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for Japan
japan.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for India
india.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for Indonesia
indonesia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. carbon dioxide emissions for Brazil
brazil.plot(x='co2', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for China
china.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for United States
us.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for Russia
russia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for Germany
germany.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for United Kingdom
uk.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for Japan
japan.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for India
india.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for Indonesia
indonesia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. methane emissions for Brazil
brazil.plot(x='methane', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for China
china.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for United States
us.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for Russia
russia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for Germany
germany.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for United Kingdom
uk.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for Japam
japan.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for India
india.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for Indonesia
indonesia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
# scatter plot of average annual temperature vs. nitrous oxide emissions for Brazil
brazil.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
These plot added some more details/patterns on the trend we saw in the lineplots. However, at this point, we are still not very certain about how strong the relationship is. This leads to the next stage in the cycle, where we will try to fit linear regreesion model.
As I mentioned previously, we are going to utilize machine learning algorithm and statistics to determine the relationship, we will try to observe how strong and whether it is a positive or negative relationship between the emissions and temperature. We will be building a linear regression model, in order to obtain the information and results we want.
We are using scikit-learn library here to fit our linear regression model into our data.
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
# linear regession for China(co2)
x_cn_co2 = china.co2.to_numpy().reshape(-1,1)
regr_cn_co2 = linear_model.LinearRegression()
regr_cn_co2.fit(x_cn_co2, china.AverageTemperature)
avgtemp_pred_cn_co2 = regr_cn_co2.predict(x_cn_co2)
china.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(china.co2, avgtemp_pred_cn_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_cn_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(china.AverageTemperature, avgtemp_pred_cn_co2))
# linear regession for China(Methane)
x_cn_ch4 = china.methane.to_numpy().reshape(-1,1)
regr_cn_ch4 = linear_model.LinearRegression()
regr_cn_ch4.fit(x_cn_ch4, china.AverageTemperature)
avgtemp_pred_cn_ch4 = regr_cn_ch4.predict(x_cn_ch4)
china.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(china.methane, avgtemp_pred_cn_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_cn_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(china.AverageTemperature, avgtemp_pred_cn_ch4))
# linear regession for China(nitrous oxide)
x_cn_n2o = china.nitrous_oxide.to_numpy().reshape(-1,1)
regr_cn_n2o = linear_model.LinearRegression()
regr_cn_n2o.fit(x_cn_n2o, china.AverageTemperature)
avgtemp_pred_cn_n2o = regr_cn_n2o.predict(x_cn_n2o)
china.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(china.nitrous_oxide, avgtemp_pred_cn_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_cn_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_cn_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(china.AverageTemperature, avgtemp_pred_cn_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(china.AverageTemperature, avgtemp_pred_cn_n2o))
# linear regession for United States(co2)
x_us_co2 = us.co2.to_numpy().reshape(-1,1)
regr_us_co2 = linear_model.LinearRegression()
regr_us_co2.fit(x_us_co2, us.AverageTemperature)
avgtemp_pred_us_co2 = regr_us_co2.predict(x_us_co2)
us.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(us.co2, avgtemp_pred_us_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_us_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(us.AverageTemperature, avgtemp_pred_us_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(us.AverageTemperature, avgtemp_pred_us_co2))
# linear regession for United States(methane)
x_us_ch4 = us.methane.to_numpy().reshape(-1,1)
regr_us_ch4 = linear_model.LinearRegression()
regr_us_ch4.fit(x_us_ch4, us.AverageTemperature)
avgtemp_pred_us_ch4 = regr_us_ch4.predict(x_us_ch4)
us.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(us.methane, avgtemp_pred_us_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_us_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(us.AverageTemperature, avgtemp_pred_us_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(us.AverageTemperature, avgtemp_pred_us_ch4))
# linear regession for United States(nitrous oxide)
x_us_n2o = us.nitrous_oxide.to_numpy().reshape(-1,1)
regr_us_n2o = linear_model.LinearRegression()
regr_us_n2o.fit(x_us_n2o, us.AverageTemperature)
avgtemp_pred_us_n2o = regr_us_n2o.predict(x_us_n2o)
us.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(us.nitrous_oxide, avgtemp_pred_us_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_us_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_us_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(us.AverageTemperature, avgtemp_pred_us_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(us.AverageTemperature, avgtemp_pred_us_n2o))
# linear regession for Russia(co2)
x_rs_co2 = russia.co2.to_numpy().reshape(-1,1)
regr_rs_co2 = linear_model.LinearRegression()
regr_rs_co2.fit(x_rs_co2, russia.AverageTemperature)
avgtemp_pred_rs_co2 = regr_rs_co2.predict(x_rs_co2)
russia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.co2, avgtemp_pred_rs_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_rs_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(russia.AverageTemperature, avgtemp_pred_rs_co2))
# linear regession for Russia(methane)
x_rs_ch4 = russia.methane.to_numpy().reshape(-1,1)
regr_rs_ch4 = linear_model.LinearRegression()
regr_rs_ch4.fit(x_rs_ch4, russia.AverageTemperature)
avgtemp_pred_rs_ch4 = regr_rs_ch4.predict(x_rs_ch4)
russia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.methane, avgtemp_pred_rs_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_rs_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(russia.AverageTemperature, avgtemp_pred_rs_ch4))
# linear regession for Russia(nitrous oxide)
x_rs_n2o = russia.nitrous_oxide.to_numpy().reshape(-1,1)
regr_rs_n2o = linear_model.LinearRegression()
regr_rs_n2o.fit(x_rs_n2o, russia.AverageTemperature)
avgtemp_pred_rs_n2o = regr_rs_n2o.predict(x_rs_n2o)
russia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(russia.nitrous_oxide, avgtemp_pred_rs_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_rs_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_rs_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(russia.AverageTemperature, avgtemp_pred_rs_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(russia.AverageTemperature, avgtemp_pred_rs_n2o))
# linear regession for Germany(co2)
x_gr_co2 = germany.co2.to_numpy().reshape(-1,1)
regr_gr_co2 = linear_model.LinearRegression()
regr_gr_co2.fit(x_gr_co2, germany.AverageTemperature)
avgtemp_pred_gr_co2 = regr_gr_co2.predict(x_gr_co2)
germany.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.co2, avgtemp_pred_gr_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_gr_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(germany.AverageTemperature, avgtemp_pred_gr_co2))
# linear regession for Germany(methane)
x_gr_ch4 = germany.methane.to_numpy().reshape(-1,1)
regr_gr_ch4 = linear_model.LinearRegression()
regr_gr_ch4.fit(x_gr_ch4, germany.AverageTemperature)
avgtemp_pred_gr_ch4 = regr_gr_ch4.predict(x_gr_ch4)
germany.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.methane, avgtemp_pred_gr_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_gr_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(germany.AverageTemperature, avgtemp_pred_gr_ch4))
# linear regession for Germany(nitrous oxide)
x_gr_n2o = germany.nitrous_oxide.to_numpy().reshape(-1,1)
regr_gr_n2o = linear_model.LinearRegression()
regr_gr_n2o.fit(x_gr_n2o, germany.AverageTemperature)
avgtemp_pred_gr_n2o = regr_gr_n2o.predict(x_gr_n2o)
germany.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(germany.nitrous_oxide, avgtemp_pred_gr_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_gr_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_gr_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(germany.AverageTemperature, avgtemp_pred_gr_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(germany.AverageTemperature, avgtemp_pred_gr_n2o))
# linear regession for United Kindom(co2)
x_uk_co2 = uk.co2.to_numpy().reshape(-1,1)
regr_uk_co2 = linear_model.LinearRegression()
regr_uk_co2.fit(x_uk_co2, uk.AverageTemperature)
avgtemp_pred_uk_co2 = regr_uk_co2.predict(x_uk_co2)
uk.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.co2, avgtemp_pred_uk_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_uk_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(uk.AverageTemperature, avgtemp_pred_uk_co2))
# linear regession for United Kindom(methane)
x_uk_ch4 = uk.methane.to_numpy().reshape(-1,1)
regr_uk_ch4 = linear_model.LinearRegression()
regr_uk_ch4.fit(x_uk_ch4, uk.AverageTemperature)
avgtemp_pred_uk_ch4 = regr_uk_ch4.predict(x_uk_ch4)
uk.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.methane, avgtemp_pred_uk_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_uk_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(uk.AverageTemperature, avgtemp_pred_uk_ch4))
# linear regession for United Kindom(nitrous oxide)
x_uk_n2o = uk.nitrous_oxide.to_numpy().reshape(-1,1)
regr_uk_n2o = linear_model.LinearRegression()
regr_uk_n2o.fit(x_uk_n2o, uk.AverageTemperature)
avgtemp_pred_uk_n2o = regr_uk_ch4.predict(x_uk_n2o)
uk.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(uk.nitrous_oxide, avgtemp_pred_uk_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_uk_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_uk_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(uk.AverageTemperature, avgtemp_pred_uk_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(uk.AverageTemperature, avgtemp_pred_uk_n2o))
# linear regession for Japan(co2)
x_jp_co2 = japan.co2.to_numpy().reshape(-1,1)
regr_jp_co2 = linear_model.LinearRegression()
regr_jp_co2.fit(x_jp_co2, japan.AverageTemperature)
avgtemp_pred_jp_co2 = regr_jp_co2.predict(x_jp_co2)
japan.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.co2, avgtemp_pred_jp_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_jp_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(japan.AverageTemperature, avgtemp_pred_jp_co2))
# linear regession for Japan(methane)
x_jp_ch4 = japan.methane.to_numpy().reshape(-1,1)
regr_jp_ch4 = linear_model.LinearRegression()
regr_jp_ch4.fit(x_jp_ch4, japan.AverageTemperature)
avgtemp_pred_jp_ch4 = regr_jp_ch4.predict(x_jp_ch4)
japan.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.methane, avgtemp_pred_jp_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_jp_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(japan.AverageTemperature, avgtemp_pred_jp_ch4))
# linear regession for Japan(nitrous oxide)
x_jp_n2o = japan.nitrous_oxide.to_numpy().reshape(-1,1)
regr_jp_n2o = linear_model.LinearRegression()
regr_jp_n2o.fit(x_jp_n2o, japan.AverageTemperature)
avgtemp_pred_jp_n2o = regr_jp_n2o.predict(x_jp_n2o)
japan.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(japan.nitrous_oxide, avgtemp_pred_jp_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_jp_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_jp_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(japan.AverageTemperature, avgtemp_pred_jp_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(japan.AverageTemperature, avgtemp_pred_jp_n2o))
# linear regession for India(co2)
x_in_co2 = india.co2.to_numpy().reshape(-1,1)
regr_in_co2 = linear_model.LinearRegression()
regr_in_co2.fit(x_in_co2, india.AverageTemperature)
avgtemp_pred_in_co2 = regr_in_co2.predict(x_in_co2)
india.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(india.co2, avgtemp_pred_in_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_in_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(india.AverageTemperature, avgtemp_pred_in_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(india.AverageTemperature, avgtemp_pred_in_co2))
# linear regession for India(methane)
x_in_ch4 = india.methane.to_numpy().reshape(-1,1)
regr_in_ch4 = linear_model.LinearRegression()
regr_in_ch4.fit(x_in_ch4, india.AverageTemperature)
avgtemp_pred_in_ch4 = regr_in_ch4.predict(x_in_ch4)
india.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(india.methane, avgtemp_pred_in_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_in_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(india.AverageTemperature, avgtemp_pred_in_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(india.AverageTemperature, avgtemp_pred_in_ch4))
# linear regession for India(nitrous oxide)
x_in_n2o = india.nitrous_oxide.to_numpy().reshape(-1,1)
regr_in_n2o = linear_model.LinearRegression()
regr_in_n2o.fit(x_in_n2o, india.AverageTemperature)
avgtemp_pred_in_n2o = regr_in_n2o.predict(x_in_n2o)
india.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(india.nitrous_oxide, avgtemp_pred_in_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_in_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_in_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(india.AverageTemperature, avgtemp_pred_in_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(india.AverageTemperature, avgtemp_pred_in_n2o))
# linear regession for Indonesia(co2)
x_id_co2 = indonesia.co2.to_numpy().reshape(-1,1)
regr_id_co2 = linear_model.LinearRegression()
regr_id_co2.fit(x_id_co2, indonesia.AverageTemperature)
avgtemp_pred_id_co2 = regr_id_co2.predict(x_id_co2)
indonesia.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.co2, avgtemp_pred_id_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_id_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(indonesia.AverageTemperature, avgtemp_pred_id_co2))
# linear regession for Indonesia(methane)
x_id_ch4 = indonesia.methane.to_numpy().reshape(-1,1)
regr_id_ch4 = linear_model.LinearRegression()
regr_id_ch4.fit(x_id_ch4, indonesia.AverageTemperature)
avgtemp_pred_id_ch4 = regr_id_ch4.predict(x_id_ch4)
indonesia.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.methane, avgtemp_pred_id_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_id_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(indonesia.AverageTemperature, avgtemp_pred_id_ch4))
# linear regession for Indonesia(nitrous oxide)
x_id_n2o = indonesia.nitrous_oxide.to_numpy().reshape(-1,1)
regr_id_n2o = linear_model.LinearRegression()
regr_id_n2o.fit(x_id_n2o, indonesia.AverageTemperature)
avgtemp_pred_id_n2o = regr_id_n2o.predict(x_id_n2o)
indonesia.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(indonesia.nitrous_oxide, avgtemp_pred_id_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_id_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_id_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(indonesia.AverageTemperature, avgtemp_pred_id_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(indonesia.AverageTemperature, avgtemp_pred_id_n2o))
# linear regession for Brazil(co2)
x_bz_co2 = brazil.co2.to_numpy().reshape(-1,1)
regr_bz_co2 = linear_model.LinearRegression()
regr_bz_co2.fit(x_bz_co2, brazil.AverageTemperature)
avgtemp_pred_bz_co2 = regr_bz_co2.predict(x_bz_co2)
brazil.plot(x='co2', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.co2, avgtemp_pred_bz_co2, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_bz_co2.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_co2.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_co2))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(brazil.AverageTemperature, avgtemp_pred_bz_co2))
# linear regession for Brazil(methane)
x_bz_ch4 = brazil.methane.to_numpy().reshape(-1,1)
regr_bz_ch4 = linear_model.LinearRegression()
regr_bz_ch4.fit(x_bz_ch4, brazil.AverageTemperature)
avgtemp_pred_bz_ch4 = regr_bz_ch4.predict(x_bz_ch4)
brazil.plot(x='methane', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.methane, avgtemp_pred_bz_ch4, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_bz_ch4.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_ch4.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_ch4))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(brazil.AverageTemperature, avgtemp_pred_bz_ch4))
# linear regession for Brazil(nitrous oxide)
x_bz_n2o = brazil.nitrous_oxide.to_numpy().reshape(-1,1)
regr_bz_n2o = linear_model.LinearRegression()
regr_bz_n2o.fit(x_bz_n2o, brazil.AverageTemperature)
avgtemp_pred_bz_n2o = regr_bz_n2o.predict(x_bz_n2o)
brazil.plot(x='nitrous_oxide', y='AverageTemperature', kind = 'scatter')
plt.plot(brazil.nitrous_oxide, avgtemp_pred_bz_n2o, color='blue', linewidth=3)
plt.show()
# The Intercept
print('Intercept: \n', regr_bz_n2o.intercept_)
# The coefficients
print('Coefficients: \n', regr_bz_n2o.coef_)
# The mean squared error
print('Mean squared error: %.2f'
% mean_squared_error(brazil.AverageTemperature, avgtemp_pred_bz_n2o))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
% r2_score(brazil.AverageTemperature, avgtemp_pred_bz_n2o))
After looking at all the plot, we can see that almost all of our target countries have a fairly weak linear relationship. The r-sqaure for china's co2 is 0.13 with a very small positive coefficient, methane is 0.13 with a small negative coefficient, and nitrous oxide is 0.12 with a small positive coefficient. For United States, the r-square for co2 is 0.1 with small positive coefficient, methane is 0.08 with small negative coefficient, and nitrous oxide is 0.09 with small negative coefficient. For Russia, they have a relatively higher r-sqaures, the r-square for co2 is 0.31, with a small negative coeffient, methane is 0.29 with a small positive coefficient, and nitrous oxide is 0.31 with small negative coefficient. For Germany, they also have a relatively higher r-sqaures. The r-square for co2 is 0.36 with small negative coefficient, methane is 0.36 with small negative coefficient, and nitrous oxide is 0.35 with small negative coefficient. For United Kingdom, they also have a relatively higher r-sqaures. The r-square for co2 is 0.24 with small negative coefficient, methane is 0.23 with small negative coefficient, and nitrous oxide is 0.29 with small negative coefficient. For Japan, the r-square for co2 is 0.17 with small negative coefficient, methane is 0.16 with small negative coefficient, and nitrous oxide is 0.16 with small negative coefficient. For India, the r-square for co2 is 0.04 with small positive coefficient, methane is 0.04 with small positive coefficient, and nitrous oxide is 0.04 with small positive coefficient. For Indonesia, the r-square for co2 is 0.03 with small positive coefficient, methane is 0.03 with small positive coefficient, and nitrous oxide is 0.02 with small positive coefficient. For Brazil, the r-square for co2 is 0.06 with small positive coefficient, methane is 0.05 with small positive coefficient, and nitrous oxide is 0.05 with small positive coefficient.
We can see that many of the target countries has a negative coefficient; however, the r-square we get are pretty small. In this case, this might prove that our hypothesis might be rejected, but we still have not obtain a r-sqaure value from all the country, so in the next part, we will try to get the r-square value for all the countries. We will display them through a choropleth world map and by then we might be able to conclude whether if our hypothesis stands, whether annual green house gases and annual average temperature by country has any relationships.
# obtain the r-square and coefficient values for all the countries(co2) in our dataset
countries = np.unique(Combined_df['Country'])
coef_co2 = {}
rsq_co2 = {}
for country in countries:
con = Combined_df[Combined_df['Country'] == country]
x_all_co2 = con.co2.to_numpy().reshape(-1,1)
regr_all_co2 = linear_model.LinearRegression()
regr_all_co2.fit(x_all_co2, con.AverageTemperature)
avgtemp_pred_all_co2 = regr_all_co2.predict(x_all_co2)
coef_co2[country] = regr_all_co2.coef_[0]
rsq_co2[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_co2)
Combined_df['coef_co2'] = Combined_df['Country'].apply(lambda x: coef_co2[x])
Combined_df['rsq_co2'] = Combined_df['Country'].apply(lambda x: rsq_co2[x])
Combined_df.head(10)
# choropleth world map for all countries' co2 emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "rsq_co2",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
# choropleth world map for all countries' co2 emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "coef_co2",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
# obtain the r-square and coefficient values for all the countries(methane) in our dataset
countries = np.unique(Combined_df['Country'])
coef_ch4 = {}
rsq_ch4 = {}
for country in countries:
con = Combined_df[Combined_df['Country'] == country]
x_all_ch4 = con.methane.to_numpy().reshape(-1,1)
regr_all_ch4 = linear_model.LinearRegression()
regr_all_ch4.fit(x_all_ch4, con.AverageTemperature)
avgtemp_pred_all_ch4 = regr_all_ch4.predict(x_all_ch4)
coef_ch4[country] = regr_all_ch4.coef_[0]
rsq_ch4[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_ch4)
Combined_df['coef_ch4'] = Combined_df['Country'].apply(lambda x: coef_ch4[x])
Combined_df['rsq_ch4'] = Combined_df['Country'].apply(lambda x: rsq_ch4[x])
Combined_df.head(10)
# choropleth world map for all countries' methane emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "rsq_ch4",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
# choropleth world map for all countries' methane emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "coef_ch4",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
# obtain the r-square and coefficient values for all the countries(nitrous oxide) in our dataset
countries = np.unique(Combined_df['Country'])
coef_n2o = {}
rsq_n2o = {}
for country in countries:
con = Combined_df[Combined_df['Country'] == country]
x_all_n2o = con.nitrous_oxide.to_numpy().reshape(-1,1)
regr_all_n2o = linear_model.LinearRegression()
regr_all_n2o.fit(x_all_n2o, con.AverageTemperature)
avgtemp_pred_all_n2o = regr_all_n2o.predict(x_all_n2o)
coef_n2o[country] = regr_all_n2o.coef_[0]
rsq_n2o[country] = r2_score(con.AverageTemperature, avgtemp_pred_all_n2o)
Combined_df['coef_n2o'] = Combined_df['Country'].apply(lambda x: coef_n2o[x])
Combined_df['rsq_n2o'] = Combined_df['Country'].apply(lambda x: rsq_n2o[x])
Combined_df.head(10)
# choropleth world map for all countries' nitrous oxide emissions r-square values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "rsq_n2o",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
# choropleth world map for all countries' nitrous oxide emissions coefficient values
fig = px.choropleth(data_frame = Combined_df,
locations= "CountryCode",
color= "coef_n2o",
hover_name= "Country",
color_continuous_scale= 'YlOrRd')
fig.show()
After look at the choropleth world map, where we looked at all the r-square values and coefficient of all the countries in our data set; We found very similar results to what we saw in our target countries. The r-sqaure values for all the countries are very low, some of the relatively high r-sqaure is still peaked at around 0.5-0.6, which is still fairly low. Also, the coefficient are generally very low. Now we have gained enough confident from our data analysis to draw a conclusions in the next part.
This is the final stage of the cycle, where we draw conclusion based on everything we did above with our data.
We will reject our hypothesis, that there are relationship between annual green house gases emissions and annual average temperature by countries. As we can see in the last stage, most of the r-square values are very low, which resemble that there are not much of a linear relationship.
Although we rejected our hypothesis, but does this mean there are no relationship between green house gases emissions and temperature? Alsolutely not. There are always many approaches to one single problem, there are many ways to examine two features. This is what makes data science and machine learning challenging and exciting, there are always more to learn from what has already established.